Skip to content

feat(webui): upload images/PDFs from the chat composer#579

Merged
yaozheng-fang merged 2 commits into
mainfrom
feat/composer-file-upload
Jun 4, 2026
Merged

feat(webui): upload images/PDFs from the chat composer#579
yaozheng-fang merged 2 commits into
mainfrom
feat/composer-file-upload

Conversation

@yaozheng-fang
Copy link
Copy Markdown
Collaborator

What

The web UI composer's + button was inert. It now opens a small upload menu
(上传图片 / 上传文件 (PDF), ChatGPT-style rounded card) and lets users
attach images and PDFs to a chat turn, aligned with the existing
/run_sse endpoint.

How

Frontend (frontend/src/)

  • + toggles a popover with two upload actions backed by hidden
    <input type="file"> (image/* and application/pdf).
  • Files are read to base64 and sent as inline_data parts on
    new_messagerunSSE now prepends attachment parts before the text part.
  • Pending attachments render as rounded image thumbnails / PDF chips (with an ×
    to remove); sent turns render the same, and history is reconstructed from
    inline_data parts so reloaded sessions show them.
  • Per-file cap (~20 MB) with an error message.
  • Rebuilt veadk/webui (committed).

Backend

  • Images already reach the model via ADK's LiteLlm image_url data-URI path
    — no change needed.
  • New veadk/utils/pdf_to_images.py: a before_model_callback that renders
    each application/pdf part to one image/png part per page via pypdfium2,
    so a vision-capable model reads it (effectively OCR; scanned PDFs included).
    Page count is capped (default 10) to bound token cost; pypdfium2/pillow are
    lazy-imported with a clear "install veadk-python[pdf]" error.
  • Added a [pdf] extra (pypdfium2 + pillow, both permissive licenses — not
    PyMuPDF/AGPL).
  • Wired onto the a2ui_agent and basic-app demo agents; basic-app installs
    [a2ui,pdf] and its README documents the attachments feature + vision-model
    requirement.

Verification

  • tests/test_pdf_to_images.py (3 tests): PDF part replaced by one image/png
    per page, original text preserved; max_pages respected; non-PDF requests
    untouched. All pass.
  • Pyright + Ruff clean on the new Python; npm run build (tsc + vite) succeeds;
    markdownlint clean on the edited READMEs.

Notes

  • The agent model must be vision-capable (default doubao-seed-1.6 is);
    non-vision models can't consume images or PDF-as-images.
  • Out of scope (follow-up): literal PDF→text extraction, drag-&-drop,
    paste-to-upload.

The composer "+" button now opens an upload menu (上传图片 / 上传文件 PDF).
Selected files are read to base64 and sent as inline_data parts on the
existing /run_sse new_message, shown as rounded thumbnails / PDF chips both
while pending and on the sent user turn (also reconstructed from history).

Images already reach the model via ADK's LiteLlm image_url path. PDFs are
handled by a new before_model_callback (veadk.utils.pdf_to_images) that
renders each page to an image/png part with pypdfium2 so a vision-capable
model can read them (effectively OCR, scanned PDFs included). Page count is
capped (default 10) to bound token cost. Wired onto the a2ui_agent and
basic-app demo agents; added a [pdf] extra (pypdfium2 + pillow).
Two fixes found while testing image/PDF upload:

- History images showed only the filename instead of the picture. ADK
  serialises inline_data bytes as URL-safe base64 (-_), but a data: URI needs
  standard base64 (+/), so the reloaded <img> failed and fell back to its alt
  text. Normalise base64url -> base64 in attachmentsFromParts.
- A failed turn (e.g. a non-vision model rejecting an image) is delivered as a
  `data: {"error": ...}` SSE frame, which the client ignored — the turn just
  rendered nothing. Surface it as the error banner and stop the stream.
@yaozheng-fang yaozheng-fang merged commit 611e1a9 into main Jun 4, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants